128-2012: Constructing a Credit Risk Scorecard Using Predictive Clusters
نویسندگان
چکیده
Traditionally the cluster analysis has been used as a descriptive tool, in which the algorithm is used to create groups of observations based on their characteristics. In this paper the use of cluster analysis as a part of a predictive algorithm is proposed. This methodology is applied by first determining to which cluster a prospect client belongs, and then calculate a specific credit risk scorecard for each cluster. Results will show that this approach provides better results than using a single scorecard for all the prospect clients. INTRODUCTION Globalization has opened markets and intensified competition, making innovation to play a key role in competitiveness. For this reason every idea should be world-wide class, focusing on increasing efficiency, productivity, quality and being cost efficient. This study aims to propose how to innovate by creating new solutions using already well known techniques such as cluster analysis. The main objective of this paper is to improve the development of credit risk scorecards by using cluster analysis, not only as a methodology to classify individuals with some specific characteristics (variables), but also as a part of a prediction process; obtaining efficient results when it comes to classifying and getting to know the profiles of the new clients that join the financial business. To do this, a comparison of two different methodologies is performed in four different databases in order to obtain an unbiased conclusion. The first methodology consists on developing scorecard models for the entire population using a logistic regression and a Multi-Layer Perceptron neural network (MLP). The second methodology involves four steps; first to carry out a cluster analysis for the entire population using K-means and Kohonen self-organizing map algorithms. Then, to develop an algorithm to assign a new client to any of the resulting clusters; the techniques used for this purpose are the multinomial logistic regression, MLP neural network, minimum Euclidian distance, minimum adjusted distance and Mahalanobis distance. The third step is to develop a scorecard for each of the clusters using also a logistic regression and a MLP neural network. Lastly, a final score is computed using three different techniques: cluster score, score ensemble and classifier average vote ensemble. To conclude, a contrast between these methodologies is conducted using the F1 score statistic as a measure of comparison. This paper is divided into five sections. First, some descriptive statistics of the databases used in the analysis are presented. Then, an introduction to the most general concepts of the methodologies used along the paper is made. Subsequently, there is an explanation of the modeling process and the particularities of the algorithms and measures of comparison applied in this paper. In the fourth section the experimental results are shown and finally the conclusions are presented. DATA In order to perform a complete analysis of the methodology exposed in this paper and to obtain unbiased significant results, four different databases from different products of a financial institution were used to develop credit risk socrecards. With the default definition found for each specific population, the clients were classified into good or bad. Also since the client’s credit information is confidential, variables were renamed to X1,X2, ..., Xn. Table 1 presents the number of good and bad clients after applying the default definition, the bad rate and the number of variables used for each of the four databases. Also, the original data is randomly divided into three different datasets used for the scorecard development, validation and stability test. Data Mining and Text Analytics SAS Global Forum 2012
منابع مشابه
Matrix Sequential Hybrid Credit Scorecard Based on Logistic Regression and Clustering
The Basel II Accord pointed out benefits of credit risk management through internal models to estimate Probability of Default (PD). Banks use default predictions to estimate the loan applicants’ PD. However, in practice, PD is not useful and banks applied credit scorecards for their decision making process. Also the competitive pressures in lending industry forced banks to use profit scorecards...
متن کاملReject Inference Techniques Implemented in Credit Scoring for SAS® Enterprise MinerTM
Many business elements are used to develop credit scorecards. Reject inference, related to the issue of sample bias, is one of the key processes required to build relevant application scorecards and is vital in creating successful scorecards. Reject inference is used to assign a target class (that is, a good or bad designation) to applications that were rejected by the financial institution and...
متن کاملImproving Credit Risk Scorecards with Memory-Based Reasoning to Reject Inference with SAS Enterprise Miner
Many business elements are used to develop credit scorecards. Reject inference, related to the issue of sample bias, is one of the key processes required to build relevant application scorecards and is vital in creating successful scorecards. Reject inference is used to assign a target class (that is, a good or bad designation) to applications that were rejected by the financial institution and...
متن کاملWhen to rebuild or when to adjust scorecards
Data based scorecards, such as those used in credit scoring, age with time and need to be rebuilt or readjusted. Unlike the huge literature on modelling the replacement and maintenance of equipment there have been hardly any models which deal with this problem for scorecards. This paper identifies an effective way of describing the predictive ability of the scorecard and from this describes a s...
متن کاملDoes segmentation always improve model performance in credit scoring?
Credit scoring allows for the credit risk assessment of bank customers. A single scoring model (scorecard) can be developed for the entire customer population, e.g. using logistic regression. However, it is often expected that segmentation, i.e. dividing the population into several groups and building separate scorecards for them, will improve the model performance. The most common statistical ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012